6/14/2018

  1. Get text of interview:
    xml2 package (read.xml), rvest

  2. Pull author names from text:
    Regular expressions (!!!) - stringr

    str_extract_all(string = btblines[x],pattern = "((?<![“])
    ([:upper:]{1}(\\. )?)+[:lower:]+(?=([ \\’\\'-][:upper:]{1}
    (\\. )?)+)(?:[\\s\\’\\'-][:upper:]{1}(\\. )?[[:upper:]{1}
    ([:lower:]\\'+)-]+)+)")
  3. Connect to goodreads & wikipedia APIs for gender & birthdate:
    xml2, WikipediR, tidytext

  4. Analyze: dplyr, ggplot

Top Authors Mentioned Together

George Saunders & Lin-Manuel Miranda
Ann Patchett & Zadie Smith
Colson Whitehead & James Baldwin
Colson Whitehead & Zadie Smith
David Sedaris & George Saunders
Elena Ferrante & George Eliot
Ernest Hemingway & F. Scott Fitzgerald
James Baldwin & Ta-Nehisi Coates